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Functional principal component analysis (FPCA) based on the 
Karhunen-Loeve decomposition has been successfully applied in many 
applications, mainly for one sample problems. In this paper we con- 
sider common functional principal components for two sample prob- 
lems. Our research is motivated not only by the theoretical challenge 
of this data situation, but also by the actual question of dynamics 
of implied volatility (IV) functions. For different maturities the log- 
returns of IVs are samples of (smooth) random functions and the 
methods proposed here study the similarities of their stochastic be- 
havior. First we present a new method for estimation of functional 
principal components from discrete noisy data. Next we present the 
two sample inference for FPCA and develop the two sample theory. 
We propose bootstrap tests for testing the equality of eigenvalues, 
eigenfunctions, and mean functions of two functional samples, illus- 
trate the test-properties by simulation study and apply the method 
to the IV analysis. 

1. Introduction. In many applications in biometrics, chemometrics, econo- 
metrics, etc., the data come from the observation of continuous phenomenons 
of time or space and can be assumed to represent a sample of i.i.d. smooth 
random functions Xi(t), . . . , X n (t) € L 2 [0, 1]. Functional data analysis has 
received considerable attention in the statistical literature during the last 
decade. In this context functional principal component analysis (FPCA) 
has proved to be a key technique. An early reference is Rao (1958), and im- 
portant methodological contributions have been given by various authors. 
Case studies and references, as well as methodological and algorithmical de- 
tails, can be found in the books by Ramsay and Silverman (2002, 2005) or 
Ferraty and Vieu (2006). 
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The well-known Karhunen-Loeve (KL) expansion provides a basic tool to 
describe the distribution of the random functions Xi and can be seen as the 
theoretical basis of FPCA. For v, w £ I? [0, 1] , let (v,w) = Jq 1 v(t)w(t) dt, and 
let || • ||= (•, -} 1 / 2 denote the usual L 2 -norm. With Ai > A2 > • ■ • and 71,72, •• • 
denoting eigenvalues and corresponding orthonormal eigenf unctions of the 
covariance operator T of JQ, we obtain Xi = fi + J2"^=i Prilr, i = 1, • ■ • , n, 
where [i = E(A",) is the mean function and (3 r i = (Xi — /x,7 r ) are (scalar) 
factor loadings with E(/3 2 j) = A r . Structure and dynamics of the random 
functions can be assessed by analyzing the "functional principal compo- 
nents" 7 r , as well as the distribution of the factor loadings. For a given 
functional sample, the unknown characteristics A r ,7 r are estimated by the 
eigenvalues and eigenfunctions of the empirical covariance operator T n of 
X\ , . . . , X n . Note that an eigenfunction 7 r is identified (up to sign) only if the 
corresponding eigenvalue A r has multiplicity one. This therefore establishes 
a necessary regularity condition for any inference based on an estimated 
functional principal component % in FPCA. Signs are arbitrary (7,. and (3 r i 
can be replaced by — 7 r and —(5 r i) and may be fixed by a suitable standard- 
ization. More detailed discussion on this topic and precise assumptions can 
be found in Section 2. 

In many important applications a small number of functional principal 
components will suffice to approximate the functions Xi with a high degree 
of accuracy. Indeed, FPCA plays a much more central role in functional data 
analysis than its well-known analogue in multivariate analysis. There are two 
major reasons. First, distributions on function spaces are complex objects, 
and the Karhunen-Loeve expansion seems to be the only practically feasible 
way to access their structure. Second, in multivariate analysis a substantial 
interpretation of principal components is often difficult and has to be based 
on vague arguments concerning the correlation of principal components with 
original variables. Such a problem does not at all exists in the functional 
context, where 71 (i), 72 (i), •• • are functions representing the major modes 
of variation of Xi(t) over t. 

In this paper we consider inference and tests of hypotheses on the struc- 
ture of functional principal components. Motivated by an application to 
implied volatility analysis, we will concentrate on the two sample case. A 
central point is the use of bootstrap procedures. We will show that the 
bootstrap methodology can also be applied to functional data. 

In Section 2 we start by discussing one-sample inference for FPCA. Basic 
results on asymptotic distributions have already been derived by 
Dauxois, Pousse and Romain (1982) in situations where the functions are di- 
rectly observable. Hall and Hosseini-Nasab (2006) develop asymptotic Tay- 
lor expansions of estimated eigenfunctions in terms of the difference ~V n — T. 
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Without deriving rigorous theoretical results, they also provide some qualita- 
tive arguments as well as simulation results motivating the use of bootstrap 
in order to construct confidence regions for principal components. 

In practice, the functions of interest are often not directly observed, but 
are regression curves which have to be reconstructed from discrete, noisy 
data. In this context the standard approach is to first estimate individual 
functions nonparametrically (e.g., by B-splines) and then to determine prin- 
cipal components of the resulting estimated empirical covariance operator — 
see Besse and Ramsay (1986), Ramsay and Dafzell (1991), among others. 
Approaches incorporating a smoothing step into the eigenanalysis have been 
proposed by Rice and Silverman (1991), Pezzulli and Silverman (1993) or 
Silverman (1996). Robust estimation of principal components has been con- 
sidered by Lacontore et al. (1999). Yao, Miiller and Wang (2005) and 
Hall, Miiller and Wang (2006) propose techniques based on nonparametric 
estimation of the covariance function E[{Aj(t) — /j,(t)}{Xi(s) — fJ,(s)}] which 
can also be applied if there are only a few scattered observations per curve. 

Section 2.1 presents a new method for estimation of functional princi- 
pal components. It consists in an adaptation of a technique introduced by 
Kneip and Utikal (2001) for the case of density functions. The key-idea is 
to represent the components of the Karhunen-Loeve expansion in terms of 
an (L 2 ) scalar-product matrix of the sample. We investigate the asymptotic 
properties of the proposed method. It is shown that under mild conditions 
the additional error caused by estimation from discrete, noisy data is first- 
order asymptotically negligible, and inference may proceed "as if" the func- 
tions were directly observed. Generalizing the results of 
Dauxois, Pousse and Romain (1982), we then present a theorem on the 
asymptotic distributions of the empirical eigenvalues and eigenf unctions. 
The structure of the asymptotic expansion derived in the theorem provides 
a basis to show consistency of bootstrap procedures. 

Section 3 deals with two-sample inference. We consider two independent 
samples of functions {^j and The problem of interest is 

to test in how far the distributions of these random functions coincide. The 
structure of the different distributions in function space can be accessed by 
means of the respective Karhunen-Loeve expansions 

oo 

xV=^ + J2P^\ P = l,2. 

r=l 

Differences in the distribution of these random functions will correspond to 
differences in the components of the respective KL expansions above. With- 
out restriction, one may require that signs are such that (7r ,7r } > 0. 
Two sample inference for FPCA in general has not been considered in the 
literature so far. In Section 3 we define bootstrap procedures for testing 
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the equality of mean functions, eigenvalues, eigenfunctions and eigenspaces. 
Consistency of the bootstrap is derived in Section 3.1, while Section 3.2 con- 
tains a simulation study providing insight into the finite sample performance 
of our tests. 

It is of particular interest to compare the functional components charac- 
terizing the two samples. If these factors are "common," this means j r := 
7r = 7r 2 "\ then only the factor loadings may vary across samples. This 
situation may be seen as a functional generalization of the concept of "com- 
mon principal components" as introduced by Flury (1988) in multivariate 
analysis. A weaker hypothesis may only require equality of the eigenspaces 
spanned by the first L € N functional principal components. [N denotes the 
set of all natural numbers 1,2,... (0 ^ N)]. If for both samples the common 
L-dimensional eigenspaces suffice to approximate the functions with high 
accuracy, then the distributions in function space are well represented by a 
low-dimensional factor model, and subsequent analysis may rely on compar- 
ing the multivariate distributions of the random vectors (/3 r ^, . . . ) T - 

The idea of "common functional principal components" is of considerable 
importance in implied volatility (IV) dynamics. This application is discussed 
in detail in Section 4. Implied volatility is obtained from the pricing model 
proposed by Black and Scholes (1973) and is a key parameter for quoting 
options prices. Our aim is to construct low-dimensional factor models for 
the log-returns of the IV functions of options with different maturities. In 
our application the first group of functional observations — }"=i; are 
log-returns on the maturity "1 month" (1M group) and second group — 
{Aj 2 '*}™i: 1 , are log-returns on the maturity "3 months" (3M group). 

The first three eigenfunctions (ordered with respect to the correspond- 
ing eigenvalues), estimated by the method described in Section 2.1, are 
plotted in Figure 1. The estimated eigenfunctions for both groups are of 
similar structure, which motivates a common FPCA approach. Based on 
discretized vectors of functional values, a (multivariate) common principal 
components analysis of implied volatilities has already been considered by 
Fengler, Hardle and Villa (2003). They rely on the methodology introduced 
by Flury (1988) which is based on maximum likelihood estimation under 
the assumption of multivariate normality. Our analysis overcomes the lim- 
itations of this approach by providing specific hypothesis tests in a fully 
functional setup. It will be shown in Section 4 that for both groups L = 3 
components suffice to explain 98.2% of the variability of the sample func- 
tions. An application of the tests developed in Section 3 does not reject the 
equality of the corresponding eigenspaces. 

2. Functional principal components and one sample inference. In this 
section we will focus on one sample of i.i.d. smooth random functions X±, . . . , 
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X n G L 2 [0, 1]. We will assume a well-defined mean function \x = E(JQ), as 
well as the existence of a continuous covariance function a(t, s) = ~E[{Xi(t) — 



ti(t)}{Xi(s)-n(8)}]. Then E(||X; 
ance operator T of Xi is given by 



Ml 



/ cx(i, t) dt < oo, and the covari- 



(Tu)(t) = / <r(t, s)v{s) ds, v £ L 2 [0, 1]. 

The Karhunen-Loeve decomposition provides a basic tool to describe the 
distribution of the random functions X$. With Ai > A2 > • ■ • and 71,72, • • • 
denoting eigenvalues and a corresponding complete orthonormal basis of 
eigenfunctions of T, we obtain 



(1) 



Xi 



A 4 



00 

£ 

r=l 



1, 



,n, 



where /3 r j = (Xj — /i, 7 r ) are uncorrelated (scalar) factor loadings with E(/3 r j) = 
0, E(/3^) = A r and E(f3 r if3ki) = for r 7^ A;. Structure and dynamics of the 
random functions can be assessed by analyzing the "functional principal 
components" j r , as well as the distribution of the factor loadings. 

A discussion of basic properties of (1) can, for example, be found in 
Gihman and Skorohod (1973). Under our assumptions, the infinite sums in 



(1) converge with probability 1, and K 



E X- 



I 2 ) < 00. Smooth- 



ness of Xi carries over to a corresponding degree of smoothness of a(t, s) 
and 7 r . If, with probability 1, Xi(t) is twice continuously differentiable, then 
a as well as j r are also twice continuously differentiable. The particular case 
of a Gaussian random function Xi implies that the (3 r i are independent 
N(0, A r ) -distributed random variables. 
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Fig. 1. Estimated eigenfunctions for 1M group in the left plot and 3M group in the right 
plot: solid — first function, dashed — second function, finely dashed — third function. 
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An important property of (1) consists in the known fact that the first L 
principal components provide a "best basis" for approximating the sample 
functions in terms of the integrated square error; see Ramsay and Silverman 
(2005), Section 6.2.3, among others. For any choice of L orthonormal basis 
functions V\ , . . . , vl , the mean integrated square error 



(2) 



p(v 1 ,...,v L )=E\ 



X; 



r=l 



V,V r )V 7 



is minimized by v r = j r 



2.1. Estimation of functional principal components. For a given sample 
an empirical analog of (1) can be constructed by using eigenvalues Ai > A 2 > 
• • • and orthonormal eigenfunctions 71 , 72, ■ • ■ of the empirical covariance 
operator r n , where 

(T n v)(t) = J a(t,s)v(s)ds, 

with X = n' 1 £™ = i Xi and a(t, s) = n" 1 E?=i{^i(*) - X(t)}{Xi(s) - X(s)} 
denoting sample mean and covariance function. Then 

n 

(3) Xi = X + ^2p ri %, i = l,...,n, 

r=l 

where f3 r i = (-y r ,Xi — X). We necessarily obtain n _1 J^i $ri = 0, n _1 J2i ftriPsi = 
for r / s, and n~ x J2i Pri = V- 

Analysis will have to concentrate on the leading principal components 
explaining the major part of the variance. In the following we will assume 
that Ai > A2 > • • • > A ro > A ro +i, where ro denotes the maximal number of 
components to be considered. For all r = 1, . . . ,ro, the corresponding eigen- 
function j r is then uniquely defined up to sign. Signs are arbitrary, decom- 
positions (1) or (3) may just as well be written in terms of —J r ,—f3ri or 
—$ri, and any suitable standardization may be applied by the statisti- 
cian. In order to ensure that % may be viewed as an estimator of 7 r rather 
than of — 7r, we will in the following only assume that signs are such that 
(7r,7r) > 0. More generally, any subsequent statement concerning differences 
of two eigenfunctions will be based on the condition of a nonnegative inner 
product. This does not impose any restriction and will go without saying. 

The results of Dauxois, Pousse and Romain (1982) imply that, under reg- 
ularity conditions, — j r \\ = O p {n~ 1 / 2 ), \X T — A r | = O p {n~ 1 / 2 ), as well as 
10* -0ri\= O p (n- 1 / 2 ) for all r < r . 

However, in practice, the sample functions Xi are often not directly ob- 
served, but have to be reconstructed from noisy observations Yij at discrete 
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design points ti k : 

(4) Y ik = X i (t ik )+e ik , k = l,...,Ti, 

where Ei k are independent noise terms with E(sjfc) = 0, Var(ejfc) = of- 

Our approach for estimating principal components is motivated by the 
well-known duality relation between row and column spaces of a data matrix; 
see Hardle and Simar (2003), Chapter 8, among others. In a first step this 
approach relies on estimating the elements of the matrix: 

(5) M lk = {X l -X,X k -X), i,fc = l,...,n. 

Some simple linear algebra shows that all nonzero eigenvalues Ai > A2 • ■ • of 

T n and £1 > I2 ■ ■ ■ of M are related by X r = l r /n, r = 1,2, When using the 

corresponding orthonormal eigenvectors pi,P2, ■ • ■ of M, the empirical scores 
Pri, as well as the empirical eigenfunctions j r , are obtained by f3 r i = \/%Pi r 
and 



(6) 7- 



r 



1 n 1 n 

-7j=^2Pir(Xi ~X) = —=Y^p ir Xi. 

Vh i=1 V«r i=1 



The elements of M are functionals which can be estimated with asym- 

— 1/2 

potically negligible bias and a parametric rate of convergence T i .If the 
data in (4) is generated from a balanced, equidistant design, then it is easily 
seen that for i 7^ j this rate of convergence is achieved by the estimator 

T 

M ij = T ~ X £(*i* " Y-h)(Yjk ~ Y-k), i ¥= h 
k=i 

and 

T 

M li = T- l J2(^k-Y. k ) 2 -af, 

k=l 

where af denotes some nonparametric estimator of variance and Y. k = n _1 x 

In the case of a random design some adjustment is necessary: Define the 
ordered sample f^m < ^(2) < • • • < h(TA °f design points, and for j = 1, . . . , Tj, 
let denote the observation belonging to tjQ). With ij( ) = — an d 
k(Ti+x) = 2-^), set 



te [0,1], 



where /(•) denotes the indicator function, and for i 7^ j, define the estimate 
of Mij by 

Mij = f {Xi(t) - x{t)}{Xj{t) ~ X(t)}dt, 



<s 
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where x(t) = n 1 E"=l Finally, by redefining = -i i(2) and t i(T . +1) = 
2 - t i{Ti) , set x*(t) = E*l 2 € f^-^*™ , )), t g [0,1]. 

Then construct estimators of the diagonal terms Mjj by 

(7) M 4i = r{Xi(*)-x(t)}{Xi(t)-x(*)}d*. 

Jo 

The aim of using the estimator (7) for the diagonal terms is to avoid the 
additional bias implied by E e (Y^|) = Xi(tij) 2 + of . Here E £ denotes con- 
ditional expectation given Xj. Alternatively, we can construct a bias 
corrected estimator using some nonparametric estimation of variance of, 
for example, the difference based model-free variance estimators studied in 
Hall, Kay and Titterington (1990) can be employed. 

The eigenvalues l\ > 1% ■ ■ ■ and eigenvectors pi,p2,... of the resulting ma- 
trix M then provide estimates \ t -t = l r / n and j3 r i-T = ytrPir of V and /3 r j. 
Estimates 7 r; T of the empirical functional principal component % can be 
determined from (6) when replacing the unknown true functions Xi by non- 
parametric estimates Xi (as, for example, local polynomial estimates) with 
smoothing parameter (bandwidth) b: 

1 n 

(8) 7r;T = —F^ P irXi • 

y l r i=i 

When considering (8), it is important to note that j r -T is defined as a 
weighted average of all estimated sample functions. Averaging reduces vari- 
ance, and efficient estimation of % therefore requires under smoothing of 
individual function estimates Xi. Theoretical results are given in Theorem 
1 below. Indeed, if, for example, n and T = rninj Tj are of the same order 
of magnitude, then under suitable additional regularity conditions it will be 
shown that for an optimal choice of a smoothing parameter b ~ (raT) -1 / 5 
and twice continuously differentiable Xi, we obtain the rate of convergence 
Hiv — %;t\\ = O p {(nT)^ 2 ^}. Note, however, that the bias corrected esti- 
mator (7) may yield negative eigenvalues. In practice, these values will be 
small and will have to be interpreted as zero. Furthermore, the eigenfunc- 
tions determined by (8) may not be exactly orthogonal. Again, when using 
reasonable bandwidths, this effect will be small, but of course (8) may by 
followed by a suitable orthogonalization procedure. 

It is of interest to compare our procedure to more standard methods 
for estimating A r and % as mentioned above. When evaluating eigenvalues 
and eigenfunctions of the empirical covariance operator of nonparametrically 
estimated curves Xi, then for fixed r < ro the above rate of convergence for 
the estimated eigenfunctions may well be achieved for a suitable choice of 
smoothing parameters (e.g., number of basis functions). But as will be seen 



COMMON FUNCTIONAL PC 



9 



from Theorem 1, our approach also implies that |A r — ^| = O v {T~ l + n _1 ). 
When using standard methods it does not seem to be possible to obtain 
a corresponding rate of convergence, since any smoothing bias |E[JQ(t)] — 
Xi (t) | will invariably affect the quality of the corresponding estimate of A r . 

We want to emphasize that any finite sample interpretation will require 
that T is sufficiently large such that our nonparametric reconstructions of 
individual curves can be assumed to possess a fairly small bias. The above ar- 
guments do not apply to extremely sparse designs with very few observations 
per curve [see Hall, Miiller and Wang (2006) for an FPCA methodology fo- 
cusing on sparse data]. 

Note that, in addition to (8), our final estimate of the empirical mean 
function p, = X will be given by lit = n~ l Y^i Xj. A straightforward approach 
to determine a suitable bandwidth b consists in a "leave-one-individual-out" 
cross-validation. For the maximal number of components to be considered, 
let fiT-i and %-T,-i, t = 1, . . . ,ro, denote the estimates of fx and j r obtained 
from the data (Yij,t[j), I = 1, . . . ,i— 1, . . . , n, j = 1, . . . , Tf.. By (8), these 
estimates depend on b, and one may approximate an optimal smoothing 
parameter by minimizing 

r r -\ 2 

i j \ r=l ) 

over b, where i) r j denote ordinary least squares estimates of /3 r j. A more 
sophisticated version of this method may even allow to select different band- 
widths b r when estimating different functional principal components by (8). 
Although, under certain regularity conditions, the same qualitative rates 
of convergence hold for any arbitrary fixed r < tq, the quality of estimates 
decreases when r becomes large. Due to (7 s ,7r) = for s < r, the number 
of zero crossings, peaks and valleys of j r has to increase with r. Hence, in 
tendency 7 r will be less and less smooth as r increases. At the same time, 
X r — > 0, which means that for large r the rth eigenfunctions will only possess 
a very small influence on the structure of X{. This in turn means that the 
relative importance of the error terms in (4) on the structure of ^ t -t will 
increase with r. 

2.2. One sample inference. Clearly, in the framework described by (1)- 
(4) we are faced with two sources of variability of estimated functional prin- 
cipal components. Due to sampling variation, j r will differ from the true 
component j r , and due to (4), there will exist an additional estimation er- 
ror when approximating % by j r] T- 

The following theorems quantify the order of magnitude of these different 
types of error. Our theoretical results are based on the following assumptions 
on the structure of the random functions Xi . 
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Assumption 1. Xi, . . . , X n e L 2 [0, 1] is an i.i.d. sample of random func- 
tions with mean \i and continuous covariance function a(t,s), and (1) holds 
for a system of eigenfunctions satisfying sup s6N sup t6 r 01 i |7s(i)| < oo. Fur- 

thermore, E~i Hffi fy] < oo and T,T=i^T=i E lAPM < oo for all 

r EN. 

Recall that E[/3 r j] = and E[/3 r j/3 s j] = for r / s. Note that the assump- 
tion on the factor loadings is necessarily fulfilled if Xj are Gaussian random 
functions. Then /3 r i and f3 S i are independent for r ^ s, all moments of (3 r i 
are finite, and hence E[/3^/3 g j/3 s j] = for q / s, as well as E[/^/3^] = A r A s 
for r 7^ s; see Gihman and Skorohod (1973). 

We need some further assumptions concerning smoothness of Xj and the 
structure of the discrete model (4). 

Assumption 2. (a) Xj is a.s. twice continuously differentiable. There 
exists a constant D\ < oo such that the derivatives are bounded by 
sup t E[X£(i) 4 ] < D 1} as well as sup t E[Xf (f) 4 ] < D x . 

(b) The design points t^, i = l,...,n, fe = l, ...,Tj, are i.i.d. random 
variables which are independent of Xj and e^. The corresponding design 
density / is continuous on [0, 1] and satisfies inf tg [ 0i i] fit) > 0. 

(c) For any i, the error terms are i.i.d. zero mean random variables 
with Var(ejfc) = a 2 . Furthermore, is independent of Xj, and there exists 
a constant D2 such that E(e| fc ) < D2 for all i, k. 

(d) The estimates Xj used in (8) are determined by either a local linear or 
a Nadaraya- Watson kernel estimator with smoothing parameter b and kernel 
function K. K is a continuous probability density which is symmetric at 0. 

The following theorems provide asymptotic results as n,T ^ 00, where 
T = minf =1 {r 4 }. 

Theorem 1. In addition to Assumptions 1 and 2, assume that inf s ^ r |A r - 
A s I > holds for some r = 1,2, Then we have the following: 



(9) 



(i) n- 1 E?=i0H - /W) 2 = OpiT" 1 ) and 



Xf — — 

n 



OpiT^ + n- 1 ). 



(ii) If additionally 6^0 and (Tb) 1 — ► as n,T — > 00, i/ien /or t € 
[0,1], 

(10) |7r(i) - %r(t)\ = O p {b 2 + (nTby 1 / 2 + (Tb 1 / 2 )- 1 + n" 1 }. 
^4 proof is given in the Appendix. 
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Theorem 2. Under Assumption 1 we obtain the following: 
(i) For allte [0,1], 

^{X(t)-MO} = E{^E/ 3 «}7,W-^(o,E A ^r(t) 2 ). 

//, furthermore, A r _i > A r > A r+ i holds for some fixed r£ {1,2,.. .}, then 
(ii) 

1 n 

(11) v^(A r - X r ) = T E(^ - A r ) + ^ iv(o, A r ), 



w/iere A r =E[(^-A r ) 2 ], 
(iii) and for all t £ [0, 1] 



7r (t) - 7r (t) = E I ^ Z X S E M " [7- (*) + (*), 

(12) 



/ I re(A r — A s ) . 



where \\ R r \\ = O p (n ). 



Moreover, 



A proof can be found in the Appendix. The theorem provides a general- 
ization of the results of Dauxois, Pousse and Romain (1982) who derive ex- 
plicit asymptotic distributions by assuming Gaussian random functions X{. 

Note that in this case A r = 2A 2 and £ g#r S 8#r ^-If^t-x^ hs (*) = 

When evaluating the bandwidth-dependent terms in (10), best rates of 
convergence \%(t) - %;T\t)\ = O p {(nT)~ 2 / b + T~ 4 / 5 + n -1 } are achieved 
when choosing an undersmoothing bandwidth h ~ max{(nT) -1 / 5 , T -2 / 5 }. 
Theoretical work in functional data analysis is usually based on the implicit 
assumption that the additional error due to (4) is negligible, and that one 
can proceed "as if" the functions X{ were directly observed. In view of 
Theorems 1 and 2, this approach is justified in the following situations: 

(1) T is much larger than re, that is, n/T 4//5 — > 0, and the smoothing 
parameter b in (8) is of order T" 1 / 5 (optimal smoothing of individual func- 
tions). 
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(2) An undersmoothing bandwidth b ~ maxKnT)" 1 / 5 , T -2 / 5 } is used and 
n/T 8 / 5 — ► 0. This means that T may be smaller than n, but T must be at 
least of order of magnitude larger than ri 5 / 8 . 

In both cases (1) and (2) the above theorems imply that |A r — = o p (\X r — 
X r \), as well as ||7 r — 7 r; r|| = Cp(||7r — 7r||)- Inference about functional prin- 
cipal components will then be first-order equivalent to an inference based 
on known functions Xj. 

In such situations Theorem 2 suggests bootstrap procedures as tools for 
one sample inference. For example, the distribution of \\% — j r \\ may by 
approximated by the bootstrap distribution of — 7 r ||, where 7* are es- 
timates to be obtained from i.i.d. bootstrap resamples X\ , X%, ■ ■ ■ , X* of 
{X%, X2, . . . , X n }. This means that X\ = X^ , . . . , AT* = Xi n for some indices 
i±,...,i n drawn independently and with replacement from {l,...,n} and, 
in practice, 7* may thus be approximated from corresponding discrete data 
(Yhj,Uij)j=i,...,Ti (Yi n j,t in j)j=i,...,T ln ■ The additional error is negligible 
if either (1) or (2) is satisfied. 

One may wonder about the validity of such a bootstrap. Functions are 
complex objects and there is no established result in bootstrap theory which 
readily generalizes to samples of random functions. But by (1), i.i.d. boot- 
strap resamples {-Aj*}i=i,...,n may be equivalently represented by correspond- 
ing, i.i.d. resamples • ■ -}i=i,...,n of factor loadings. Standard multi- 
variate bootstrap theorems imply that for any gSN the distribution of mo- 
ments of the random vectors (f3u, . . . , (3 q i) may be consistently approximated 
by the bootstrap distribution of corresponding moments of . . . , To- 
gether with some straightforward limit arguments as q — ► 00, the structure of 
the first-order terms in the asymptotic expansions (11) and (12) then allows 
to establish consistency of the functional bootstrap. These arguments will 
be made precise in the proof of Theorem 3 below, which concerns related 
bootstrap statistics in two sample problems. 

Remark. Theorem 2(iii) implies that the variance of % is large if one of 
the differences A r _i — A r or A r — A r +i is small. In the limit case of eigenval- 
ues of multiplicity m > 1 our theory does not apply. Note that then only the 
m-dimensional eigenspace is identified, but not a particular basis (eigenfunc- 
tions). In multivariate PCA Tyler (1981) provides some inference results on 
corresponding projection matrices assuming that A^. ^> X r ^i ^ * * * ^ Xr~\~rn ^ 
A r + m +i for known values of r and m. 

Although the existence of eigenvalues A r , r < ro, with multiplicity m > 1 
may be considered as a degenerate case, it is immediately seen that A r — > 
and, hence, A r — A r +i — > as r increases. Even in the case of fully observed 
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functions Xj , estimates of eigenfunctions corresponding to very small eigen- 
values will thus be poor. The problem of determining a sensible upper limit 
of the number r$ of principal components to be analyzed is addressed in 
Hall and Hosseini-Nasab (2006). 

3. Two sample inference. The comparison of functional components across 
groups leads naturally to two sample problems. Thus, let 

JffUf' Jfi' and xi 2 \x?\...,xW 

denote two independent samples of smooth functions. The problem of inter- 
est is to test in how far the distributions of these random functions coincide. 
The structure of the different distributions in function space can be accessed 
by means of the respective Karhunen-Loeve decompositions. The problem 
to be considered then translates into testing equality of the different com- 
ponents of these decompositions given by 

oo 

(13) X^=^+J2P^\ p=l,2, 



(p) 

where again jr are the eigenfunctions of the respective covariance operator 
rW corresponding to the eigenvalues = E{(/?{f) 2 } > A^ } = E{((3$) 2 } > 

■ ■ ■. We will again suppose that X^_i > X^ > A^ l5 p = 1,2, for all r < r$ 
components to be considered. Without restriction, we will additionally as- 
sume that signs are such that (7^,7^ ) > 0, as well as (Tr^Tr ) > 0. 

It is of great interest to detect possible variations in the functional compo- 
nents characterizing the two samples in (13). Significant difference may give 
rise to substantial interpretation. Important hypotheses to be considered 
thus are as follows: 

F 0l :/i (1) =/i (2) and H 02 r : 7 « = 7 f ) , r<r . 

Hypothesis Hq 2 r is of particular importance. Then jr = 7r 2 ^ and only the 
factor loadings [3 r i may vary across samples. If, for example, Hq 2 t is ac- 
cepted, one may additionally want to test hypotheses about the distribu- 
tions of p%\ p = 1, 2. Recall that necessarily E{/3^ } } = 0, E{(3^ ] } 2 = X^ ] , 
and pj^ is uncorrelated with (3^ if r 7^ s. If the are Gaussian random 

variables, the are independent N(0,X r ) random variables. A natural 
hypothesis to be tested then refers to the equality of variances: 

Fo3 ir :A« = A( 2 ), r = l,2,.... 

Let A (p) (*) = ^EiX^(t), and let X? > X^ > ■ ■ ■ and de- 
note eigenvalues and corresponding eigenfunctions of the empirical covari- 
ance operator f n} of X , X^ (t) , . . . , XnJ ■ The following test statistics are 
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defined in terms of fi^ , Xr and • As discussed in the proceeding section, 
all curves in both samples are usually not directly observed, but have to be 
reconstructed from noisy observations according to (4). In this situation, the 
"true" empirical eigenvalues and eigenfunctions have to be replaced by their 

discrete sample estimates. Bootstrap estimates are obtained by resampling 

(v) 

the observations corresponding to the unknown curves X!> . As discussed in 
Section 2.2, the validity of our test procedures is then based on the assump- 
tion that T is sufficiently large such that the additional estimation error is 
asymptotically negligible. 

Our tests of the hypotheses Hq 1 , Hq 2 t and Hq 3 t rely on the statistics 

The respective null-hypothesis has to be rejected if D\ > Ai ; i_ a , Z?2,r > 
A2,r;i-a or ^3,r > ^3,r;i-«, where Ai ; i_ a , A2,r;i-a and A3 )r; i_ a denote the 
critical values of the distributions of 

A2,r d = t \\^-^-(^-^)\\ 2 , 

A 3 /= f |A«-A«-(A( 2 )-A( 2 ))| 2 . 

Of course, the distributions of the different A's cannot be accessed directly, 
since they depend on the unknown true population mean, eigenvalues and 
eigenfunctions. However, it will be shown below that these distributions and, 
hence, their critical values are approximated by the bootstrap distribution 
of 

A .^|| A (l>_ A (l)_ (/i (2)._ A (2) ) ||2 j 
^,r = \\lP* -#-(#*- 7^)l| 2 , 

where fi^* , Jr , Ar , as well as /}( 2 )* , j^* , A^ , are estimates to be 
obtained from independent bootstrap samples X\* (t) , X\* {t) , . . . ,X^*(t), as 
wellasX 2 *(t),X 2 *(t),...,X 2 *(t). 

This test procedure is motivated by the following insights: 

(1) Under each of our null-hypotheses the respective test statistics D is 
equal to the corresponding A. The test will thus asymptotically possess the 
correct level: P(D > Ai_ a ) « a. 
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(2) If the null hypothesis is false, then D / A. Compared to the distribu- 
tion of A, the distribution of D is shifted by the difference in the true means, 
eigenfunctions or eigenvalues. In tendency D will be larger than A\— a . 

Let 1 < L < tq. Even if for r < L the equality of eigenfunctions is rejected, 
we may be interested in the question of whether at least the L-dimensional 
eigenspaces generated by the first L eigenfunctions are identical. Therefore, 
let £r , as well as £^\ denote the L-dimensional linear function spaces 

generated by the eigenfunctions 7^ , . . . , 7^ and 7^ , . . . , 7^ , respectively. 
We then aim to test the null hypothesis: 



:£ 



(l) 



£ 



(2) 



! 4 ,l ~"-L 

Of course, Hq 4 l corresponds to the hypothesis that the operators projecting 

into £^ and £^ are identical. This in turn translates into the condition 
that 

E jP (thP (s) = E lP (t)4 2) (s) for all t,s & [0, 1] . 

r=l r=l 

Similar to above, a suitable test statistic is given by 

D ^ = J J I E ^ (*)# (s) - E # (*)# (s) Xdtds 

and the null hypothesis is rejected if D^l > A4 j x, ; i_ a , where A4 j £ ; i_ Q de- 
notes the critical value of the distribution of 



dcf 



L 
Lr=l 



E{#(^ 2) («)-#(^ 2) ( S )} 



r=l 



dtds. 



The distribution of A^/, and, hence, its critical values are approximated 
by the bootstrap distribution of 



A 



dcf 



4,i 



r=l 



r=l 



(it (is. 



It will be shown in Theorem 3 below that under the null hypothesis, as well as 
under the alternative, the distributions of nAi, nA2, r , nA^^, hA^l converge 
to continuous limit distributions which can be consistently approximated by 
the bootstrap distributions of nA\, nA^, nA^ r , nA| L . 
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3.1. Theoretical results. Let n = {n-i+n-z) /2. We will assume that asymp- 
totically n\ = n- q± and n 2 = n ■ q 2 for some fixed proportions q\ and q 2 . We 
will then study the asymptotic behavior of our statistics as n — ► 00. 

We will use X x = {x{ 1} , . . .,X$] and X 2 = ■ .,X$} to denote 

the observed samples of random functions. 

Theorem 3. Assume that {X[ 1] , . . . , X$} and {x{ 2) , . . . , xj$} are two 
independent samples of random functions, each of which satisfies Assump- 
tion 1. As n — > 00 we then obtain the following: 

(i) There exists a nondegenerated, continuous probability distribution F\ 

£ 

such that nAj — ► F\, and for any 5 > 0, 

\P(nA 1 >S)- P{nA\ > S\X 1 ,X 2 )\ = O p (l). 

(ii) // ; furthermore, A^ > xi 1] > X% and A^i > A^ 2) > A^i hold for 
some fixed r = 1, 2, . . . , there exist a nondegenerated, continuous probability 

£ 

distributions F kr such that nA\. r — > F^ r , k = 2, 3, and for any 5 > 0, 
|P(nA,, r >S)- P(nA* k)r > 5\X U X 2 )\ = O p (l), k = 2,3. 

(iii) // Af 1} > A^i > and xi 2) > A^i > hold for all r = 1, . . . , L, there 
exists a nondegenerated, continuous probability distribution F^l such that 

£ 

nA^L — ► F^l, and for any 5 > 0, 

|P(nA 4 , L >S)- P(nAX L > S\X X ,X 2 )\ = o p (l). 

The structures of the distributions F%, F 2>r , F^ r , F^i are derived in the 
proof of the theorem which can be found in the Appendix. They are obtained 
as limits of distributions of quadratic forms. 

3.2. Simulation study. In this paragraph we illustrate the finite behavior 
of the proposed test. The basic simulation-setup (setup "a") is established 
as follows: the first sample is generated by the random combination of or- 
thonormalized sine and cosine functions (Fourier functions) and the second 
sample is generated by the random combination of the same but shifted 
factor functions: 

X\ l \ti k ) = /3i 4 1) ^2sin(2^ ifc ) + /^ ) v / 2cos(27r^), 

Xf\ti k ) = f3[ 2) V2sm{2ir(t tk + 5)} + /?g ) v / 2cos{27r(t ife + 5)}. 

The factor loadings are i.i.d. random variables with pf) ~iV(0, Af 3 ) and 

~ N(0, X ( 2 p) ). The functions are generated on the equidistant grid = 
tk = k/T, k = 1, . . .T = 100, i = 1, . . . ,n = 70. The simulation setup is based 
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Table 1 

The results of the simulations for a = 0.1, n = 70, T = 100, number of simulations 250 



Setup/shift 





0.05 


0.1 


0.15 


0.2 


0.25 


(a) 10, 5, 8, 4 


0.13 


0.41 


0.85 


0.96 


1 


1 


(a) 4, 2, 2, 1 


0.12 


0.48 


0.87 


0.96 


1 


1 


(a) 2, 1, 1.5, 2 


0.14 


0.372 


0.704 


0.872 


0.92 


0.9 


(b) 10, 5, 8, 4 Di 


0.10 


0.44 


0.86 


0.95 


1 


1 


(b) 10, 5, 8, 4 D 2 


1 


1 


1 


1 


1 


1 



on the fact that the error of the estimation of the eigenfunctions simulated 
by sine and cosine functions is, in particular, manifested by some shift of 
the estimated eigenfunctions. The focus of this simulation study is the test 
of common eigenfunctions. 

For the presentation of results in Table 1, we use the following notation: 

"(a) \i , A 2 , A 2 2 \ A 2 ." The shift parameter 5 is changing from to 0.25 
with the step 0.05. It should be mentioned that the shift 5 = yields the 
simulation of level and setup with shift 5 = 0.25 yields the simulation of the 
alternative, where the two factor functions are exchanged. 

In the second setup (setup "b") the first factor functions are the same 
and the second factor functions differ: 

X\ l \t ik ) = V2sm{2irt lk ) + $ yfi cos (2vrt ifc ), 

Xf\t ik ) = f3 { $ v / 2sin{27r(t ifc + 5)} + $f v / 2sin{47r(t ifc + 5)}. 

In Table 1 we use the notation "(b) A^, A 2 , A 2 2 \ A 2 2 \ D r ." D r means the 
test for the equality of the rth eigenfunction. In the bootstrap tests we used 
500 bootstrap replications. The critical level in this simulation is a = 0.1. 
The number of simulations is 250. 

We can interpret Table 1 in the following way: In power simulations (5 7^ 0) 
test behaves as expected: less powerful if the functions are "hardly distin- 
guishable" (small shift, small difference in eigenvalues). The level approxima- 
tion seems to be less precise if the difference in the eing envalues (A^ } -A 2 p) ) 
becomes smaller. This can be explained by relative small sample-size n, small 
number of bootstrap-replications and increasing estimation-error as argued 
in Theorem 2, assertion (iii). 

In comparison to our general setup (4), we used an equidistant and 
common design for all functions. This simplification is necessary, it sim- 
plifies and speeds-up the simulations, in particular, using general random 
and observation-specific design makes the simulation computationally un- 
tractable. 

Second, we omitted the additional observation error, this corresponds to 
the standard assumptions in the functional principal components theory. As 
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Table 2 

The results of the simulation for a = 0.1, n = 70, T = 100 with additional error in 

observation 



Setup/shift 





0.05 


0.1 


0.15 


0.2 


0.25 


(a) 10, 5, 8, 4 


0.09 


0.35 


0.64 


0.92 


0.94 


0.97 



argued in Section 2.2, the inference based on the directly observed functions 
and estimated functions X{ is first-order equivalent under mild conditions 
implied by Theorems 1 and 2. In order to illustrate this theoretical result in 
the simulation, we used the following setup: 

X^iUk) = p[] ] V2sm{2TTt ik ) + V2cos(2irt ik ) + , 

Xi 2) (tik) = /? 1 - ) v / 2sin{27r(t ifc + 6)} + 0® v 7 ^ cos{2vr(t ifc + 8)} + e$ , 

where e^j) ~ iV(0, 0.25), p = 1,2, all other parameters remain the same as 
in the simulation setup "a." Using this setup, we recalculate the simulation 

presented in the second "row" of Table 1, for estimation of the functions 

(p) 

Xy ,p= 1,2, we used the Nadaraya-Watson estimation with Epanechnikov 
kernel and bandwidth b = 0.05. We run the simulations with various band- 
widths, the choice of the bandwidth does not have a strong influence on 
results except by oversmoothing (large bandwidths) . The results are printed 
in Table 2. As we can see, the difference of the simulation results using es- 
timated functions is not significant in comparison to the results printed in 
the second line of Table 1 — directly observed functional values. 

The last limitation of this simulation study is the choice of a partic- 
ular alternative. A more general setup of this simulation study might be 
based on the following model: X^\t) = P^j[ 1} (t) + f3^ j ) ^\t), xf\t) = 
Pvili^it) +/^72^(*)' where 7i ,7i , 72 an< ^ 9 are mutually orthogonal 
functions on L 2 [0, 1] and 7^ = (1 + ^ 2 )~ 1 ^ 2 {7^ + vg]. Basically we create 

the alternative by the contamination of one of the "eigenfunctions" (in our 

(2) 

case the second one) in the direction g and ensure H72 || = 1. The amount 
of the contamination is controlled by the parameter v. Note that the exact 
squared integral difference H7I — T2 1 1 2 does not depend on function g. 
Thus, in the "functional sense" particular "direction of the alternative hy- 
pothesis" represented by the function g has no impact on the power of the 
test. However, since we are using a nonparametric estimation technique, we 
might expect that rough (highly fluctuating) functions g will yield higher er- 
ror of estimation and, hence, decrease the precision (and power) of the test. 
Finally, a higher number of factor functions (L) in simulation may cause less 
precise approximation of critical values and more bootstrap replications and 
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larger sample-size may be needed. This can also be expected from Theorem 
2 in Section 2.2 — the variance of the estimated eigenfunctions depends on 
all eigenfunctions corresponding to nonzero eingenvalues. 

4. Implied volatility analysis. In this section we present an application 
of the method discussed in previous sections to the implied volatilities of Eu- 
ropean options on the German stock index (ODAX). Implied volatilities are 
derived from the Black-Scholes (BS) pricing formula for European options; 
see Black and Scholes (1973). European call and put options are derivatives 
written on an underlying asset with price process Si , which yield the pay-off 
max(S7 — K, 0) and max(K — Si, 0), respectively. Here i denotes the current 
day, / the expiration day and K the strike price. Time to maturity is defined 
as t = I — i. The BS pricing formula for a Call option is 

(14) C i {S i ,K,T,r,a) = S i ${d 1 )-Ke- rT ®{d 2 ), 

where d± = ln ( ,s »/- g H^_+ cr ^ r , d 2 =d\ — 0\[r, r is the risk-free interest rate, 
a is the (unknown and constant) volatility parameter, and <3? denotes the 
c.d.f. of a standard normal distributed random variable. In (14) we assume 
the zero-dividend case. The Put option price p can be obtained from the 
put-call parity p = d — Si + e~ Tr K. 

The implied volatility a is defined as the volatility a, for which the BS 
price Ci in (14) equals the price C, observed on the market. For a single 
asset, we obtain at each time point (day i) and for each maturity r a IV 
function aJ(K). Practitioners often rescale the strike dimension by plotting 
this surface in terms of (futures) moneyness k = K/Fi(r), where P(t) = 

Clearly, for given parameters Si ,r,K,r the mapping from prices to IVs is 
a one-to-one mapping. The IV is often used for quoting the European options 
in financial practice, since it reflects the "uncertainty" of the financial market 
better than the option prices. It is also known that if the stock price drops, 
the IV raises (so-called leverage effect), motivates hedging strategies based 
on IVs. Consequently, for the purpose of this application, we will regard the 
BS-IV as an individual financial variable. The practical relevance of such 
an approach is justified by the volatility based financial products such as 
VDAX, which are commonly traded on the option markets. 

The goal of this analysis is to study the dynamics of the IV functions for 
different maturities. More specifically, our aim is to construct low dimen- 
sional factor models based on the truncated Karhunen-Loeve expansions 
(1) for the log-returns of the IV functions of options with different maturi- 
ties and compare these factor models using the methodology presented in 
the previous sections. Analysis of IVs based on a low-dimensional factor 
model gives directly a descriptive insight into the structure of distribution 



20 



M. BENKO, W. HARDLE AND A. KNEIP 



of the log-IV-returns — structure of the factors and empirical distribution of 
the factor loadings may be a good starting point for further pricing models. 
In practice, such a factor model can also be used in Monte Carlo based pric- 
ing methods and for risk-management (hedging) purposes. For comprehen- 
sive monographs on IV and IV-factor models, see Hafner (2004) or Fengler 
(2005b). 

The idea of constructing and analyzing the factor models for log-IV- 
returns for different maturities was originally proposed by 
Fengler, Hardle and Villa (2003), who studied the dynamics of the IV via 
PCA on discretized IV functions for different maturity groups and tested the 
Common Principal Components (CPC) hypotheses (equality of eigenvectors 
and eigenspaces for different groups). Fengler, Hardle and Villa (2003) pro- 
posed a PCA-based factor model for log-IV-returns on (short) maturities 
1, 2 and 3 months and grid of moneyness [0.85,0.9,0.95,1,1.05,1.1]. They 
showed that the factor functions do not significantly differ and only the 
factor loadings differ across maturity groups. Their method relies on the 
CPC methodology introduced by Flury (1988) which is based on maximum 
likelihood estimation under the assumption of multivariate normality. The 
log-IV-returns are extracted by the two-dimensional Nadaraya- Watson es- 
timate. 

The main aim of this application is to reconsider their results in a func- 
tional sense. Doing so, we overcome two basic weaknesses of their approach. 
First, the factor model proposed by Fengler, Hardle and Villa (2003) is per- 
formed only on a sparse design of moneyness. However, in practice (e.g., 
in Monte Carlo pricing methods), evaluation of the model on a fine grid is 
needed. Using the functional PCA approach, we may overcome this difficulty 
and evaluate the factor model on an arbitrary fine grid. The second difficulty 
of the procedure proposed by Fengler, Hardle and Villa (2003) stems from 
the data design — on the exchange we cannot observe options with desired 
maturity on each day and we need to estimate them from the IV-functions 
with maturities observed on the particular day. Consequently, the two- 
dimensional Nadaraya- Watson estimator proposed by Fengler, Hardle and Villa 
(2003) results essentially in the (weighted) average of the IVs (with clos- 
est maturities) observed on a particular day, which may affect the test 
of the common eigenfunction hypothesis. We use the linear interpolation 

scheme in the total variance 0TOTi( K ' r ) ^ {uJ{k)) 2 t, in order to recover 
the IV functions with fixed maturity (on day i). This interpolation scheme is 
based on the arbitrage arguments originally proposed by Kahale (2004) for 
zero-dividend and zero-interest rate case and generalized for deterministic 
interest rate by Fengler (2005a). More precisely, having IVs with matu- 
rities observed on a particular day i: a^ H (n), ji = l,...,p Ti , we calculate 
the corresponding total variance o"TOT,i( K ) r jJ- From these total variances 
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we linearly interpolate the total variance with the desired maturity from 
the nearest maturities observed on day i. The total variance can be easily 
transformed to corresponding IV oJ{n). As the last step, we calculate the 

log-returns A log cr?" (re) = log (re) — log ct^(k). The log-IV-returns are ob- 
served for each maturity r on a discrete grid ttj k . We assume that observed 
log-IV-return A log &l(Kj k ) consists of true log-return of the IV function 
denoted by Alogcr^(re[ fc ) and possibly of some additional error e[ k . By set- 
ting Y? k := Alog(T t T (ft[ fc ), XJ(k) := A log cr?" (re), we obtain an analogue of the 
model (4) with the argument re: 

(15) Yf k = XT(K ik )+eJ k , i = l,...,n T . 

In order to simplify the notation and make the connection with the theoret- 
ical part clear, we will use the notation of (15). 

For our analysis we use a recent data set containing daily data from 
January 2004 to June 2004 from the German-Swiss exchange (EUREX). 
Violations of the arbitrage-free assumptions ( "obvious" errors in data) were 
corrected using the procedure proposed by Fengler (2005a). Similarly to 
Fengler, Hardle and Villa (2003), we excluded options with maturity smaller 
then 10 days, since these option-prices are known to be very noisy, par- 
tially because of a special and arbitrary setup in the pricing systems of the 
dealers. Using the interpolation scheme described above, we calculate the 
log-IV-returns for two maturity groups: "IM" group with maturity r = 0.12 
(measured in years) and "3M" group with maturity r = 0.36. The observed 
log-IV-returns are denoted by Y> k M \ k = l,..., K} M , F| m , k = 1, . . . , Kf M . 
Since we ensured that for no i, the interpolation procedure uses data with 
the same maturity for both groups, this procedure has no impact on the 
independence of both samples. 

The underlying models based on the truncated version of (3) are as fol- 
lows: 

(16) X} m {k) = X 1M ( K ) + £ ftf^ 1M (K), i = 1, • ■ -,niM, 

r=l 

(17) Xf M (n) = X* m (k) + ]T m^ M (^ i = 1, • ■ • , n 3M . 

r=l 

Models (16) and (17) can serve, for example, in a Monte Carlo pricing tool 
in the risk management for pricing exotic options where the whole path of 
implied volatilities is needed to determine the price. Estimating the factor 
functions in (16) and (17) by eigenfunctions displayed in Figure 1, we only 

need to fit the (estimated) factor loadings fij^ and /3|^ . The pillar of the 
model is the dimension reduction. Keeping the factor function fixed for a 
certain time period, we need to analyze (two) multivariate random processes 
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of the factor loadings. For the purposes of this paper we will focus on the 
comparison of factors from models (16) and (17) and the technical details of 
the factor loading analysis will not be discussed here, since in this respect 
we refer to Fengler, Hardle and Villa (2003), who proposed to fit the factor 
loadings by centered normal distributions with diagonal variance matrix 
containing the corresponding eigenvalues. For a deeper discussion of the 
fitting of factor loadings using a more sophisticated approach, basically based 
on (possibly multivariate) GARCH models; see Fengler (2005b). 

From our data set we obtained 88 functional observations for the 1M group 
(^im) arid 125 observations for the 3M group (tl^m)- We will estimate the 
model on the interval for futures moneyness k G [0.8,1.1]. In comparison 
to Fengler, Hardle and Villa (2003), we may estimate models (16) and (17) 
on an arbitrary fine grid (we used an equidistant grid of 500 points on the 
interval [0.8,1.1]). For illustration, the Nadaraya- Watson (NW) estimator 
of resulting log-returns is plotted in Figure 2. The smoothing parameters 
have been chosen in accordance with the requirements in Section 2.2. As 
argued in Section 2.2, we should use small smoothing parameters in order 
to avoid a possible bias in the estimated eigenfunctions. Thus, we use for 
each i essentially the smallest bandwidth hi that guarantees that estimator 
Xi is defined on the entire support [0.8,1.1]. 

Using the procedures described in Section 2.1, we first estimate the eigen- 
functions of both maturity groups. The estimated eigenfunctions are plot- 
ted in Figure 1. The structure of the eigenfunctions is in accordance with 
other empirical studies on IV-surfaces. For a deeper discussion and econom- 
ical interpretation, see, for example, Fengler, Hardle and Mammen (2007) 
or Fengler, Hardle and Villa (2003). 

Clearly, the ratio of the variance explained by the fcth factor function is 
given by the quantity v\ M = A^/X^-LY f° r the 1M group and, corre- 
spondingly, by i>| M for the 3M group. In Table 3 we list the contributions of 
the factor functions. Looking at Table 3, we can see that 4th factor functions 
explain less than 1% of the variation. This number was the "threshold" for 
the choice of L±m and L2M- 

We can observe (see Figure 1) that the factor functions for both groups 
are similar. Thus, in the next step we use the bootstrap test for testing the 



Table 3 

Variance explained by the eigenfunctions 





Var. explained 1M 


Var. explained 3M 




89.9% 


93.0% 




7.7% 


4.2% 


PI 


1.7% 


1.0% 




0.6% 


0.4% 
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equality of the factor functions. We use 2000 bootstrap replications. The test 
of equality of the eigenfunctions was rejected for the first eigenfunction for 
the analyzed time period (January 2004-June 2004) at a significance level 
a = 0.05 (P-value 0.01). We may conclude that the (first) factor functions are 
not identical in the factor model for both maturity groups. However, from 
a practical point of view, we are more interested in checking the appropri- 
ateness of the entire models for a fixed number of factors: L = 2 or L = 3 in 
(16) and (17). This requirement translates into the testing of the equality of 
eigenspaces. Thus, in the next step we use the same setup (2000 bootstrap 
replications) to test the hypotheses that the first two and first three eigen- 
functions span the same eigenspaces £\ M and £f M . None of the hypotheses 
for L = 2 and L = 3 is rejected at significance level a = 0.05 (P-value is 0.61 
for L = 2 and 0.09 for L = 3). Summarizing, even in the functional sense we 
have no significant reason to reject the hypothesis of common eigenspaces 
for these two maturity groups. Using this hypothesis, the factors governing 
the movement of the returns of IV surface are invariant to time to ma- 
turity, only their relative importance can vary. This leads to the common 
factor model: XJ («) = X T (k) + J2r=i /?ri)v( K )> * = 1, • • • , n T , r = 1M, 3M, 
where j r := "f^ M = 7^ M . Beside contributing to the understanding of the 
structure of the IV function dynamics, the common factor model helps 
us to reduce the number of functional factors by half compared to mod- 
els (16) and (17). Furthermore, from the technical point of view, we also 
obtain an additional dimension reduction and higher estimation precision, 
since under this hypothesis we may estimate the eigenfunctions from the 
(individually centered) pooled sample Xi(K) 1M ,i = 1, . . . ,n±M, Xf M = 
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1, .. . ,ri3M- The main improvement compared to the multivariate study by 
Fengler, Hardle and Villa (2003) is that our test is performed in the func- 
tional sense - it does not depend on particular discretization and our factor 
model can be evaluated on an arbitrary fine grid. 

APPENDIX: MATHEMATICAL PROOFS 

In the following, ||u|| = (Jq 1 v(t) 2 dt) 1 / 2 will denote the L 2 -norm for any 
square integrable function v. At the same time, ||a|| = (i J2i=i a i) 1 ^ 2 wm 
indicate the Euclidean norm, whenever a £ M fc is a A;- vector for some k € N. 

In the proof of Theorem 1, E £ and Var e denote expectation and variance 
with respect to e only (i.e., conditional on tij and Xj). 

Proof of Theorem \. Recall the definition of the Xi(f) an d note that 
Xi(t)=X?(t)+Xi(t), where 

as well as 

x?(t) = f:x i (t iU) )i(te 

for t e [0,1], t m = and t i( r i+1) = 2-t i{Ti) . Similarly, x*(t) = xf*(*) + 

xf{t). 

By Assumption 2, Efl^y) - ^(j.i)! 5 ) = 0(T~ S ) for s = 1, . . . ,4, and the 
convergence is uniform in j < n. Our assumptions on the structure of X% 
together with some straightforward Taylor expansions then lead to 

(Xi,Xj) = (X i ,X j ) + O p (l/T) 

and 

(Xi,xl) = \\X l \\ 2 + O p (l/T). 

Moreover, 

E e (( X f,xf))=0, E £ (|| X f|| 2 )= ( 7f, 
E £ ((xf,xf))=0, E e (( X !, X r) 2 ) = O p (l/n 

E £ (( X l X f) 2 )=O p (l/T), E £ ((x!,xf)(xl,X?)) = for i^k, 
E £ «xf,x|><xf,xl» =0 for j^k and E e (|| X f || 4 ) = O p (l) 

hold (uniformly) for all i,j = l,...,n. 

Consequently, E e (||x|| 2 - ||A|| 2 ) = O p {T~ l + n" 1 ). 
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When using these relations, it is easily seen that for all i, j = 1, . . . , n 

Mij - M i:j = O p (T~ 1/2 + n~ 1 ) and 

( 18 ) _ 

tr{(M - M) 2 } 1/2 = O p (l + nT' 1 / 2 ). 

Since the orthonormal eigenvectors p q of M satisfy = 1, we furthermore 
obtain for any i = 1, . . . ,n and all q = 1, 2, . . . 



(19) , M ir M r / 

,=i 1 ^ 



as well as 



Xl(t)xf(t)dt \ = O p (T- 1 / 2 + n~ 1 / 2 ), 



/ n 1 / 2 

( T l/2 



n -i 

(20) X>W Xf(t)xf(*)«ft = 1 
and 

n n Z" 1 /n 1 / 2 \ 

(21) E a *E?W XfWxfWd^Opf-ya) 
i=i j=i Jo \J- / 

for any further vector a with ||a|| = 1. 

Recall that the jth largest eigenvalue L satisfies n\j = L. Since by as- 
sumption inf s ^ r |A r — A s | > 0, the results of Dauxois, Pousse and Romain 
(1982) imply that A r converges to A r as n — > oo, and sup s j r ^ 1 - = Op(l), 

' | At A s | 

which leads to sup,^ r ^ ^ | = O p (l/n). Assertion (a) of Lemma A of 
Kneip and Utikal (2001) together with (18)-(21) then implies that 



(22) 



» l r 
A r — — 
n 



n-% -l r \ = n~ l \pl. {M - M)p T \ + OpiT' 1 +n~ l ) 
Op{(nTy l l 2 +T- l +n- 1 }. 



When analyzing the difference between the estimated and true eigenvec- 
tors p r andp r , assertion (b) of Lemma A of Kneip and Utikal (2001) together 
with (18) lead to 

(23) p r -pr = -S r (M -M)p r + K r , with \\K r \\ = Op(T~ 1 + n~ 1 ) 

and S r = J2 s ^r T^i;PsPl- Since sup|| || =1 a T S r a < sup s ^ r i^^y = O p (l/n), 
we can conclude that 



(24) \\p r -p r \\ = O p (T 



-1/2 



n 



and our assertion on the sequence n 1 J2i(Pn — Pri;T) 2 is an immediate con- 
sequence. 
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Let us now consider assertion (ii). The well-known properties of local lin- 
ear estimators imply that \E £ {Xi(t) — Xi(t)}\ = O p (b 2 ), as well as Var £ {Xj(t)} 
O p {Tb}, and the convergence is uniform for all i,n. Furthermore, due to the 
independence of the error term ey, Cov e {Xi(t), Xj(t)} = for i ^ j. There- 
fore, 



7r(*) 



^2p ir Xi(t) 



i=l 



OJb 2 + 



1 



VnTbJ 



On the other hand, (18)-(24) imply that with X(t) = (Xi(t), . . .,X n (t)) T 
1 



7r;T(f) 



i=l 



—FF J2(Pir ~ Pir)Xi(t) + ~Jj=^2(Pir ~ Pir){Xi 
V'r i= i V'r i=1 



(t)-Xi(t)} 



+ O p {T~ 1 +n~ 1 ) 



\S r X(t)\\ 



pJ(M-M)S r 



X(t) 



\S r X 



n 



+ O p {b 2 T- 1/2 + T- 1 b~ 1/2 + < 
= Opin-^T- 1 ' 2 + b 2 T~ 1 / 2 + T-V 1 ^ + n" 1 ). 
This proves the theorem. □ 

Proof of Theorem 2. First consider assertion (i). By definition, 

| 7r (t). 



i=l r \ i=l / 

Recall that, by assumption, j3 r i are independent, zero mean random variables 
with variance A r , and that the above series converges with probability 1. 
When defining the truncated series 



r=l 



i=l 



standard central limit theorems therefore imply that y/nV(q) is asymptoti- 
cally N(0,J2t=i ^rjr{t) 2 ) distributed for any possible gGN. 

The assertion of a iV(0, J2r=i K^fr(t) 2 ) limiting distribution now is a 
consequence of the fact that for all 5\ , 62 > there exists a 5,5 such that 
P{\y/nV{q) - \MEr> _1 £i=i A-i)7r(*)| > Si} < 5 2 for all q > q 5 and all n 
sufficiently large. 
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In order to prove assertions (i) and (ii), consider some fixed r€ {1,2,...} 
with A r _i > A r > A r+ i. Note that T as well nuclear, self-adjoint and 

non-negative linear operators with Tv = J a(t, s)v(s) ds and T n v = 
J a(t, s)v(s) ds, v £ L 2 [0, 1]. For m£N, let U m denote the orthogonal projec- 
tor from L 2 [0, 1] into the m-dimensional linear space spanned by {71, . . . , 7 m }, 
that is, ILmV = YllLi ip, 7^)7^ , v G L 2 [0, 1] . Now consider the operator H m Y n IL r 
as well as its eigenvalues and corresponding eigenfunctions denoted by Ai. m > 
A2,m >■■ and 7i, m , 72,m, • • • , respectively. It follows from well-known re- 
sults in the Hilbert space theory that U m r n U m converges strongly to T n as 
m — > 00. Furthermore, we obtain (Rayleigh-Ritz theorem) 

(25) lim A rm = A r and lim \\^f r — % m \\ = if A r _i > X r > A r +i. 



m—foo 



Note that under the above condition 7 r is uniquely determined up to sign, 
and recall that we always assume that the right "versions" (with respect 
to sign) are used so that (7r,7r,m) > 0- By definition, f3ji = J ^j{t){Xi{t) — 
fjL(t)}dt, and therefore, / 7j-(i){Xj(i) — X(t)}dt = j3ji — /3j, as well as AQ — 
X = J2j{Pji ~ 0j)ljt where (3j = -J2iLi Pji- When analyzing the structure 
of n m r n IT m more deeply, we can verify that II m r n n m u = / a m (t, s)v(s) ds, 
v e L 2 [0,1], with 

o"m(*, s) = 9m(t) T T, m g m (s), 

where g m (t) = (71 (i), . . . ,7m(0) T > an d where S m is the m x m matrix with 
elements {^Y2=i(Pji- (3j)(0ki-f3k)}j,k=i,...,m- Let Ai(£ m ) > A 2 (S m ) > • > 
A m (E m ) and Ci,m, ■ ■ ■ , Cm,m denote eigenvalues and corresponding eigenvec- 
tors of S m . Some straightforward algebra then shows that 

(26) A rfn — A r (£ m ), 7r,m — 9m(t) Cr,m- 

We will use S m to represent the m x m diagonal matrix with diagonal 
entries Ai > • • • > \ m . Obviously, the corresponding eigenvectors are given 
by the m-dimensional unit vectors denoted by e\ m ,..., e m>m . Lemma A of 
Kneip and Utikal (2001) now implies that the differences between eigenval- 
ues and eigenvectors of S m and S m can be bounded by 



(27) 



Ar,m A r — tr{e rjm e r m (£ m S m )}-|--R. 



(28) 



fi 6sup,| a || =1 a T (S m -S m ) 2 a 

with R r m < ^ — — , 

mm s I A s — X r I 

6sup N|=1 a T (£ m -S m ) 2 a 
Wlth llRr > Jl - minJA,-AJ 2 ' 



28 M. BENKO, W. HARDLE AND A. KNEIP 

where S r ^ m = J2s=£r \ a —\ r e s,m^l^ m - 

Assumption 1 implies E(/? r ) = 0, Var(/3 r ) = , and with 8u = l, as well 
as 5ij = for i ^ j , we obtain 

E< sup a T (S m -S m ) 2 a 

l||a|| = l 

<E{tr[(£ m -£ m ) 2 ]} 



(29) 



{ m r i n i 2 1 

j,k=ii ,L t=i j j 

i n 

Tl . 
1=1 

^EE E {^})+^" 1 ) = ^ 1 

\ j k ) 



<e E 



for allm. Since tr{e rtm ej m (t m - J: m )} = ^J2i=i(Pn - Pr) 2 - K, (25), (26), 
(27) and (29) together with standard central limit theorems imply that 

1 n 

v n i=l 
1 n 

(30) = -7=E[(^) 2 -E{(/?r,) 2 }] +O p (n' 1 / 2 ) 



n 



—* N{0, A r ). 

It remains to prove assertion (hi). Relations (26) and (28) lead to 



(t)-7r(*)=Sm(*) (Cr 



(31) 



s^r 



n(X s ~ A r ) ^ 



X;(0«-&)(A<-A-m.(t) 



+ 5 f m(^) Rr.mi 



where due to (29) the function g m (t) T R* m satishes 
E(\\glR*\\)=E(\\R*\\) 



< 



6 



n min s | A s — A r | 2 I 



V j k / 
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for all m. By Assumption 1, the series in (31) converge with probability 1 

as m — > oo. 

Obviously, the event A r _i > A r > A r+ i occurs with probability 1. Since m 
is arbitrary, we can therefore conclude from (25) and (31) that 

7r(*)-7r(<) 



(32) 



E< 
£< 



1 



n(X s ~ Ar) ^ 
1 

n(A s - Ar) . = 



- &)(&* - fr) \ ls {t) + i? r (t) 
i=l J 



where = O p (n x ), as well as ||i2r|| = O p (n 1 ). Moreover, sjn x 

J2s^r{ n (x —A ) SE=i f3sif3ri}ls(t) is a zero mean random variable with vari- 

ance E g ^ r Es^r { \ q _ a") (a, - a, ) 7<? (*)7« (*) < °°- % Assumption 1, it follows 
from standard central limit arguments that for any g S N the truncated series 

=i,s^r[n(A a -A r ) ET=i @si@ri\ls(t) is asymptotically normal 
distributed. The asserted asymptotic normality of the complete series then 
follows from an argument similar to the one used in the proof of assertion 
(i). □ 

Proof of Theorem 3. The results of Theorem 2 imply that 

ni 



nAi 



E 



(33) 



E 



E^^W) dt. 



-a) 



Furthermore, independence of X 4 - ±; and A| together with (30) imply that 



(34) 



^[A«-AW-{A( 2 )-A( 2 )}]^iv(o 



A. 



(i) 



+ 



A 



(2) 



92 



and 



n 



k { P/q 1 + Lf ) /q 2 



A £ 2 



Furthermore, (32) leads to 



nA 



2,r 



(35) 



E 



i 



(1) E^si A 



E 



1 



^ I V?2™2(Ai 2) - A^ 2) ) i=1 



+ O p (n 
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lP (*){#(«)- 7^ («)} 



,r=l 



+ 7 W(„){ 7 W(t)- 7 W(t)} 



+ 7^H#(*)-7^)} 



dtdu + (IT, (n^ 1 / 2 ) 



(36) 



EE 

r=l s>L 



in 



L 

EE 

r 



.2. 

=ls>L I 



x{ 7 ( 1 )(t) 7 ( 1 )(n)+ 7 ( 1 )(n) 7 ( 1 )(t)} 



1 



"2 



_W 2) /3 (2) 



x{7^ 2) (i)7i 2) W + 7^ 2) (^)7i 2) W} 



-. 2 



dtdu 



-l/2> 



In order to verify (36), note that E^=iEs=i,sA 



for 



p = 1, 2 and all possible sequences a±, . . . , a/,. It is clear from our assumptions 
that all sums involved converge with probability 1. Recall that E(/3^ ) /3j ) ) = 
0, p = 1, 2 for r ^ s. 



It follows that X 



- ^ £ s ^ 3^t^k7i P) , p = 1, 2, is a continu- 



?"(P)||2> 



< 



ous, zero mean random function on L [0, 1], and, by assumption, E(||X 

oo. By Hilbert space central limit theorems [see, e.g., Araujo and Gine (1980)], 

~ (p) (v) 
X y /' thus converges in distribution to a Gaussian random function Q- as 

n — ► oo. Obviously, £r is independent of We can conclude that nA^L 
possesses a continuous limit distribution F^ l defined by the distribution 

Of // Er=l {tP (th^ (U) + (U) 7r (1) (t)} - Er=l {tP (thP (U) + ^ (U) X 

'ji 2 \t)}] 2 dtdu. Similar arguments show the existence of continuous limit 
distributions F\ and F2 :T of nA\ and nA2, r - 

For given q E N, define vectors = } , . . . , 0$ , ) T G R«, 6^ = 

KPli Pri ) • • • ) Pr-l,iPri ' Pr+l,iPri > ■ ■ ■ > Pqi P r i ) G K aIld - (Pij P 2 i > 
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. . . ,p?' ) T € U.( q ~^ L . When the infinite sums over r in (33), respectively 
s ^ r in (35) and (36), are restricted to q € N components (i.e., J2r an d J2s>L 
are replaced by J2 r <q an d J2L<s<q)> then the above relations can generally 
be presented as limits nA = lim^oo nA(q) of quadratic forms 

m 



( 1 



nA x (q) 



m 
1 



/ 1 



8=1 
i=l 



/ 1 



0? 



(37) 



nA 2 , r (g) 



l 



raA 4iL (g) 



"2 
1 



/ 



ni \ 

1=1 
ri2 

E4 2) 

i=l 
ni 



"1 
1 



£<4 lA 

8=1 

•n.2 

,(2) 



/ 1 



Q 2 



»2 

1 

"1 



1 



1=1 



E*S A 



i=i 
"2 



m 
1 



V v 7 ^ 



E€ A 

i=i 

E 6 

i=i 



(2) 
£3 



Ql 



no 
1 



E4 2 ' 



8=1 
"1 



"1 
1 



V V^2 



E4 lA 

8=1 

E^ 

8=1 



(2) 
i3 



where the elements of the 2q x 2q, 2(q — l) x 2(q — l) and 2L(q — 1) x 2L(q— 1) 
matrices Q^, an d Q3 can be computed from the respective (^-element) 
version of (33)-(36). Assumption 1 implies that all series converge with 
probability 1 as q — > 00, and by (33)-(36), it is easily seen that for all e, 6 > 
there exist some q(e,S),n(e,5) E N such that 



(38) 



P{\nA 1 - nAi(q)\ > e) < S, P{\nA 2 , r - nA 2 , r (q)\ > e) < 5, 
P(|nA 4 ,L-nA 4 ,L(g)|>e)<5 

hold for all g > g(e, (5) and all n > n(e, <5). For any given q, we have E(6ji) = 
E(6j 2 ) — E(6j3) = 0, and it follows from Assumption 1 that the respective 
covariance structures can be represented by finite covariance matrices fl\ q , 
^2,q and ^3^. It therefore follows from our assumptions together with stan- 

J ik 



dard multivariate central limit theorems that the vectors {-7= 



zCr=i(^k ) T } T ' ^ = I5 2, 3, are asymptotically normal with zero means 
and covariance matrices Q\ >q , Q 2jg an d ^3,9- One can thus conclude that, as 



n 



00, 



(39) nAi(g)AFi 



nA 2 , r (g) -> F 2 , r 



where i^g, i^.r.gj ^i,L,q denote the continuous distributions of the quadratic 
forms ^7 Qi^i, Q|z2, Q3Z3 with zi ~ 2V(0,fii jg ), z 2 ~ iV(0, r2 2j(? ), Z3 ~ 
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N(0, f^g). Since e,5 are arbitrary, (38) implies 



(40) 



lim Fi, = Fi, 



lim F 2tT ,q = F 2>r , 
q— >oo 



lim F& L q = F A L . 

q— too 



We now have to consider the asymptotic properties of bootstrapped eigen- 
values and eigenfunctions. Let JfW* = i E^i , A? * = / lr P) {t){x\ p> (t) 



tit)}, = ^ErW*- and note that - = 

— /?r *. When considering unconditional expectations, our assumptions 
imply that for p = 1, 2 

E[($ p) *) 2 ] 



0. 

x (p) 



E[(/? r ( f) 2 ] = A^, 
E{[(/# } * ) 2 -A r W] 2 } = A r W 



(41) 



E E 



j,/c=i 



i 



E(A?*-^r)(/3, 



(?)* V/?(p)* 



A-/ 







(p)* 



(p) 



p i=i 



"p \ i m ! 



l^k 

One can infer from (41) that the arguments used to prove Theorem 1 
can be generalized to approximate the difference between the bootstrap 
eigenvalues and eigenfunctions Ar , 7^ and the true eigenvalues , 
7 r P \ All infinite sums involved converge with probability 1. Relation (30) 
then generalizes to 



;(aW*-aW)-v^(aW-aW) 



(42) 



1 n P 

^E(#*-^*) 2 
n p 1=1 



p 1=1 



1 



E (A 



"P i=l 

Similarly, (32) becomes 



1 ™ p ~l 

^-E(*) 2 +<w 1/2 )- 



7, 



'7, 



(p) 



(43) 



: 7 



(p)* 



7 



(p) 



7r 



(ph 
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- £ 7(pt-tm ^ £(^f - - ~ti p> ) 

sjLr I As - A^' n V i= l 

i -1 n p , 

As - Ar ' J P »=1 J 



s^r {As- K' H P i=l \ U P k=l J ) 



where due to (28), (29) and (41), the remainder term satisfies ||i?r P ^*|| = 
O p {n~ l ). 

We are now ready to analyze the bootstrap versions A* of the different 
A. First consider A^ r and note that {O^f*) 2 } are i.i.d. bootstrap resam- 
ples from {{(i^) 2 }- It therefore follows from basic bootstrap results that 



the conditional distribution of -L= E^iPifT ~ h Efcii(/^?) 2 ] § iven X i 



converges to the same iV(0,A^) limit distribution as -4= Y^Li[{fi^i ) 2 



p 



E {(/ ? ri ) ) 2 }]- Together with the independence of (pl} ] *) 2 and (P^*) 2 , the 
assertion of the theorem is an immediate consequence. 

Let us turn to A*, A^ and A* AL . Using (41)~(43), it is then easily seen 
that nA\, nA 2 r and nA^ L admit expansions similar to (33), (35) and (36), 

when replacing there -j^YZiPri ^ ^E>i(4f " £ E£i /3# ) , as 
well as ^E^x^W by ^T&!&> ~ i E^x tf/^)- 

Replacing (3^' , by pfi* , pfi* l ea ds to bootstrap analogs o^* of 
the vectors 6^, k = 1,2,3. For any gGN, define bootstrap versions nA\(q), 
nA2 r (<?) and nA* AL (q) of nAi(q), nA 2i r(q) and nA 4j ^(g) by using 

ffiiPff* " £ £*Li ^) T ' * " £ ffil 6 i?) T ) instead of 

(^E^($V.^jE£i(&S?) T ). 1,2,3, in (37). Applying again (41)- 

(43) , one can conclude that for any e > there exists some q(e) such that, 
as n — ► oo, 

P(|nA5-nA5(g)|<e)->l, 

(44) P(|nA^-nA^( 9 )|<e)^l, 
P(\nAl L -nAX L (q)\<e)^l 
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hold for all q > q(e). Of course, (44) generalizes to the conditional probabil- 
ities given Xi, X?,. 

In order to prove the theorem, it thus only remains to show that for any 
given q and all S 

(45) \P(nA(q)>5)-P(nA*(q)>5\ X u X 2 )\ = O p {l) 

hold for either A(q) = Ai(g) and A*(q) = A|(g), A(q) = A 2 , r (q) and A*(q) = 
A* 2 , r (q), or A(q) = A^ L (q) and A*(q) = A\ L {q). But note that for k = 

1, 2, 3, E(6jfc) = 0, {b\{?*} are i.i.d. bootstrap resamples from {b^}, and 
E(b^)*\Xi,X2) = ^~J2kLib^ are the corresponding conditional means. It 
therefore follows from basic bootstrap results that as n — > oo the conditional 
distribution of (^ E&C&ff* " £ ££l &£?) T , ^ E^l(^f " £ Efcii ^) 
given A/l, ^2 converges to the same N(0,£lk, q ) limit distribution as 
Er=i(^) T 5 ^fc E"=i 5 (^V)- This obviously holds for all q G N, and 
(45) is an immediate consequence. The theorem then follows from (38), (39), 
(40), (44) and (45). □ 
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